MEMORYLESS NEAR-COLLISIONS, REVISITED 



MARIO LAMBERGER AND ELMAR TEUFL 



Abstract. In this paper we discuss the problem of gencrically finding near- 
collisions for cryptographic hash functions in a memoryless way. A common ap- 
proach is to truncate several output bits of the hash function and to look for 
collisions of this modified function. In two recent papers, an enhancement to 
this approach was introduced which is based on classical cycle-finding techniques 
and covering codes. This paper investigates two aspects of the problem of mem- 
oryless near-collisions. Firstly, we give a full treatment of the trade-off between 
the number of truncated bits and the success-probability of the truncation based 
" approach. Secondly, we demonstrate the limits of cycle-finding methods for find- 

(N. ing near-collisions by showing that, opposed to the collision case, a memoryless 

variant cannot match the query-complexity of the "memory-full" birthday-like 
near-collision finding method. 
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1. Introduction 



ov 

c/3 . The field of hash function research has developed significantly in the light of the 

attacks on some of the most frequently used hash functions like MD4, MD5 and SHA- 
1. As a consequence, academia and industry started to evaluate alternative hash 
functions, e.g. in the SHA-3 initiative organized by NIST [15]. During this ongoing 
VO . evaluation, not only the three classical security requirements collision resistance, 

preimage resistance and second preimage resistance are considered. Researchers 
■^j- " look at (semi-)free-start collisions, near-collisions, distinguishers, etc. A 'behavior 

q{ ■ different from that expected of a random oracle' for the hash function is undesirable 

as are weaknesses that are demonstrated only for the compression function and not 
for the full hash function. 

Coding theory and hash function cryptanalysis have gone hand in hand for quite 
some time now, where a crucial part of the attacks is based on the search for low- 
weight code words in a linear code (cf. [2, 4, 17] among others). In this paper, 
we want to elaborate on a newly proposed application of coding theory to hash 
function cryptanalysis. In [12, 13], it is demonstrated how to use covering codes to 
find near-collisions for hash functions in a memoryless way. We also want to refer to 
the recent paper [8] which considers similar concepts from the viewpoint of locality 
sensitive hashing. 

In all of the following, we will work with binary values, where we identify {0, l} n 
with Zg. Let "+" denote the n-bit exclusive-or operation. The Hamming weight of 
a vector v £ Zg is denoted by w(v) = \{i\vi = 1}| and the Hamming distance of 
two vectors by d(u, v) = w(u + v). The Handbook of Applied Cryptography [14, 
page 331] defines near- collision resistance of a hash function H as follows: 
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Definition 1 (Near-Collision Resistance). It should be hard to find any two inputs 
m, m* with m ^ m* such that H(m) and H(m*) differ in only a small number of 
bits: 



For ease of later use we also give the following definition: 

Definition 2. A message pair m, m* with m ^ m* is called an e-near- collision for 
H if (1) holds. 

Collisions can be considered a special case of near-collisions with the parameter 
e = 0. The generic method for finding collisions for a given hash function is based 
on the birthday paradox and attributed to Yuval [22]. There are well established 
cycle-finding techniques (due to Floyd, Brent, Nivasch, cf. [3, 11, 16]) that remove 
the memory requirements from an attack based on the birthday paradox (see also 
[20]). These methods work by repeated iteration of the underlying hash function 
where in all of these applications the function is considered to behave like a random 
mapping (cf. [7, 9]). 

In [12, 13], the question is raised whether or not the above mentioned cycle- 
finding techniques are also applicable to the problem of finding near-collisions. We 
now briefly summarize the ideas of [12, 13]. 

Since Definitions 1 and 2 include collisions as well, the task of finding near- 
collisions is easier than finding collisions. We now want to have a look at generic 
methods to construct near-collisions which are more efficient than the generic meth- 
ods to find collisions. 

In the following, let B r (x) := {y G | d(x,y) < r} denote the Hamming ball 
(or Hamming sphere) around x of radius r. Furthermore, we denote by S n (r) := 
|fi r (x)| = Y2i=o iT) t ne cardinality of any n-dimensional Hamming ball of radius r. 

A simple adaption of the classical table-based birthday attack for finding e-near- 
collisions is to start with an empty table, randomly select a message m and compute 
H(m) and then test whether the table contains an entry (H(m) + 5, m*) for some 
5 G B e (0) and arbitrary m*. If so, the pair (m,m*) is an e-near-collision. If not, 
(H(m),m) is added to the table and repeat. Then, we know the following: 

Lemma 1 ([12]). Let H be an n-bit hash function. If we assume that H acts like a 
random mapping, the average number of messages that we need to hash and store in a 
table-based birthday-like attack before we find an e-near-collision is 0(2 n ^ 2 S n (e)~ 1 ^ 2 ) . 

Remark 1. We want to note that in this paper we are measuring the complexity 
of a problem by counting (hash) function invocations. This constitutes an adequate 
measure in the case of the memoryless algorithms in this paper, however the real 
computational complexity of the table-based algorithm above is dominated by the 
memory access, as the problem of searching for an e-near-collision in the table is 
much harder than testing for a collision. 

The first straight-forward approach to apply the cycle-finding algorithms to the 
problem of finding near-collisions is a truncation based approach. 

Lemma 2. Let H be an n-bit hash function. Let r e : X? — > Z? -6 be a map that 
truncates e bits from its input at predefined positions. If we assume that r e o H acts 
like a random mapping, we can apply a cycle-finding algorithm to the map t € o H to 
find an e-near-collision in a memoryless way with an expected complexity of about 

2 (n-e)/2_ 




(1) 
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Proof. Under the assumptions of the lemma, the results from [7, 9] are applied to a 
random mapping with output length n — e. □ 

2. A Thorough Analysis of the Truncation Approach 

As indicated in [12], a simple idea to improve the truncation based approach 
is to truncate more than e bits. That is, in order to find an e-near-collision we 
simply truncate li bits with li > e. A cycle-finding method applied to r M o H has 
an expected complexity of 2 ( - n ~^^ 2 and deterministically finds two messages m, m* 
such that d(H(m), H(m*)) < li. However, we can look at the probability that these 
two messages m, m* satisfy d(H(m), H{m*)) < e which is 2" M YTi=o u) = 2~ At 5' M (e). 

For a truly memoryless approach, multiple runs of the cycle-finding algorithm are 
interpreted as independent events. Therefore, the expected complexity to find an 
e-near-collision can be obtained as the product of the expected complexity to find 
a cycle, and the expected number of repetitions of the cycle-finding algorithm, i.e. 
the reciprocal value of the probability that a single run finds an e-near-collision. In 
other words, we end up with an expected complexity of 

2 («+M)/2^( e )-l = 2 (^)/ 2 (fy) (2) 

Remark 2. In [12], the above approach was already proposed with li — 2e + 1. In 
this case (2) results in a complexity of 

2 („.+2e + l)/2 52e+i(e) -l = 2 (n + l)/2-e ; 

which clearly improves upon Lemma 2. Here we have used that S2 e +i(e) = |S2e+i(2e+ 
1) = 2 2e . 

An interesting question that now arises is to find the number of truncated bits 
li that constitutes the best trade-off between a larger li, i.e. a faster cycle-finding 
part, and a higher number of repetitions for this probabilistic approach. In other 
words, we would like to determine the value of \i which minimizes (2) for a given 
e. Analogously, we can search for an integer li > e such that for a given e the 
expression 2~ fJj ^ 2 S fl (e) is maximized. For small values of e, values for li were already 
computed in [12] by an exhaustive search. In this section, we want so solve this 
problem analytically. 

We first show a result that tells us something about the behavior of the sequence 
of real numbers 

a,:=2-^S,(e) = 2-^j2( fI )- ( 3 ) 

We want to note that based on the origin of the problem, we are only interested in 
values a M for ii > e. Our analysis is still valid starting with fi = 1. We will need the 
following two properties of sequences: 

Definition 3. Let a M be a real-valued sequence. 

(i) A sequence is called unimodal in li, if there exists an index t such that 
o-i < a-2 < ■ ■ ■ < a t and a t > a t+ i > a t +2 > • • • The index t is called a mode of 
the sequence. 

(ii) A sequence a M is called log-concave, if a 2 > a^\a^ + i holds for every li. If > is 
replaced by >, we speak of a strictly log-concave sequence. 
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Lemma 3. The sequence defined in (3) is strictly log-concave and therefore also 
unimodal. 

Proof. It is a well known fact that a log-concave sequence is also unimodal, cf. for 
example [18]. So in order to show that (3) is strictly log-concave we have to show 
that for any e > 1, 
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holds. By using the recursion for the binomial coefficient twice, we can transform 
the inequality (4) into 
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which boils down to the inequality 



/ i=0 \ / \ / i=0 \ 

By direct computation using the definition of the binomial coefficient, it is easy to 
see that each summand on the left is strictly larger than the respective summand 
on the right, simply because e > i. □ 

The strict log-concavity guarantees us the existence of at most two adjacent indices 
for which the sequence a M attains its global maximum. But if there would be an 
index t, such that at = a t+ \ is maximal, the definition of the sequence a M in (3) shows 
that this would imply the existence of two positive integers a, b such that a = y/2b, 
which is clearly not possible. Therefore, the mode of the sequence is indeed unique. 

In order to find the mode of a^, we have to investigate some properties of truncated 
sums of binomial coefficients. There are well known bounds for the sum S^(e), which 
yield upper and lower bounds for the optimal value of \i. As we are interested in 
an asymptotically correct approximation for the optimal /i, we need to derive an 
asymptotic expansion of S^e) which seems to be hard to find in the literature. 
Notationally, we use f(fi) ~ g(fi) if lim^oo f(fi)/g(fi) = 1 and /(//) x g(fi) if there 
exist positive ci,C2,/io such that c\ ■ \g(fi)\ < |/(a0| < c% • \g(n)\ for ah A* ^ A^o- 



Proposition 1. Let S^(e) = X^ =0 Of) and defi 



ne a :- 



- . If we assume, that there 



exist constants c±, C2 such that 0<Ci<a<C2<h, then we have 



h 



2e(/x 



fi-2e {pL- 2ef 



OQiT 



(5) 



for e, fi — > 00 and thus 



2e 



Proof. For Kewe have 
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Because of the requirements in the proposition we have 

e a c 2 



/i — e 1 — a 1 — c 2 
For sake of notation we set B := -r 2 — and c := -r 2 —. This then leads to 



<':9gfc)'-^-(3^-e 



(7) 



From equation (7) we learn that S^(e) x 

The following can be seen as a discrete version of Laplace's method to approximate 
integrals (cf. [6]). 

w = ±(£) = E © + E C) = ^-) + E( £ _^> 

k=0 v 7 0<fc<e-r v 7 e-r<k<e v 7 0<fc<r x 7 

where r = r(/x) is such that r = o(/i) for // — > oo. We will determine r later. 
Because of (6) and (7) we obtain 



This implies 



v 7 \0<fc<r i=0 r 



We now have a closer look at the product above: 



k-l . /k-1 „ i 



i=0 ^ \ i=0 n n , 

For x, y close to we have 

i a ~\~ X 1 1 , o o\ 

log- — = log/3 + - -x- y + 0{x +y). 

1 — a + y a l — a 

Since < i < k < r and r = o(/x) we conclude 



°" , = log« l_.*„ (l-2a) 1 + Q /g 

1 — a + - — 1 (l — a) /i a(l — a) [i \/j> 2 



log ; rf-j = log /3 - - — - • - - ^ — ^ • - + O 

H n 

where the error term is uniform in < k < r. With this we get 

fc-i 



ne-i ok ( I -2a k 1 k 2 ^fk 3 
= 3 k exp — hO - 
._ Q /j, — e + k — i \2a(l — a) [i 2a(l — a) fi \fi 2 

k , l-2a k 1 k 2 | (k 3 

2a(l — a) jj, 2a(l — a) ji \/x 2 
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In total we obtain that S^e) j is equal to 



E^ + S 

0<k<r v 



2a k lk 2 /fc 3 

+ 



2a(l — a) [i 2a{\ — a) fi \/i 
up to an error term which is bounded by 0(c r ). Since 

E if ■ 5 = °("" 2 ) 



0<fc<r 



and 



\ 2a(l-a) /I 2a(l-a) /i / 



k>r 



it follows that 5 M (e) / (^) is equal to 

E ^ f 1 + ■ * - ■ k ~) + + rV). 

^ V 2a(l-a) /x 2a(l-a) /i / 

Simplifying the infinite sum above yields 

M; V e / V 1 - 2 ^ (l-2a) 3 n VP 

We now choose r = r(/i) = (log/i) 2 , since then r 2 c r = o(fi~ 2 ), which readily implies 
the statement using the definition of a. □ 

The results of the Lem. 3 and Prop. 1 can now be combined in the following way. 
We are interested in the behavior of a M , that is, 



2-^ 2 S,{e) = 2-^ 2 ^(^\. 

i=o \ l ' 

We have already seen that there will be a unique mode t for the sequence. Until this 
index, we have a M+ i/a M > 1 and for all following values of /i, we have a^+i/a^ < 1. If 
we evaluate the fraction, we get a^+i/a^ = S ^ + i(e) / (\/2 S ^(e)) . From the recurrence 
relation of the binomial coefficient we get the analogous recurrence relation for 5* M (e), 
namely S^+i (e) = S^(e) + 5 M (e — 1) = 25 M (e) — ( M ). If we use this in the above 
equation we end up with 



a n+i 



V2 1 




If we now use the asymptotic expansion in (5) we can compute an approximation 
for fi = /i(e) such that an optimum for (2) is found. 

Theorem 2. Let H be a hash function producing an n-bit hash value and let e > 1 
be given. Let : Z?, 1 —> TZJ!^^ be a map that truncates fi fixed bits from an n-bit 
value, and suppose we apply a cycle-finding algorithm to r^oH, which is assumed to 
act like a random mapping. Then, there exists a unique optimal choice fi = fi(e) > e 
to find an e-near- collision. For large e, we have 



M6) = (2 + v / 2)( e -l) + 0(6- 1 ). (9) 



S M (e) > 
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Proof. Substituting the lower bound 

and the upper bound of (7) in (8) implies that the mode t of the sequence is 
bounded by (1 + y/2)e - 1 < t < (2 + y/2)e. For values of fj, in the domain above 
we may use Prop. 1, since the quotient e//x is easily seen to be bounded in the right 
way. Furthermore, /i x e and \i — 2e x e. For large values of e we infer from 

that the mode t must satisfy the equation 



^(2-^(^ + 0(0) 



Solving this equation yields £ = (2 + \/2)e + 0(1). Now let us try to obtain further 
terms of the asymptotic expansion of t using bootstrapping (see for instance [6]). 
Using the full strength of Prop. 1 implies that the equation 

must be satisfied by the mode t. Using the ansatz t = (2 + y/2)e + r, where r = 0(1), 
yields 

2(1 + V2) ((3 - 2v / 2)r + (2 - v^je 2 + 0(e) = 0. 

Hence we get r = -(2 + ^2) + 0(e~ l ) and t = (2 + v / 2)(e - 1) + ^(e" 1 ) which 
corresponds to /i(e). □ 

We want to note that in both, Prop. 1 and Th. 2, it is possible to compute an 
arbitrary number of terms of the asymptotic expansions (5) and (9). We end this 
section with Table 1 demonstrating the quality of the approximation of (9). The 
actual values for fi(e) are produced by an exhaustive search and for simplicity, (9) 
is replaced with [(2 + y/2)(e - 1)1 . 

TABLE 1. Comparison of /x(e) and ff(e) := \(2 + y/2)(e - 1)] for 
certain values of e. 



e 1 2 3 4 ... 8 9 10 ... 98 99 100 

H(e) 2 5 8 11 ... 25 28 32 ... 332 335 339 
/i*(e) 4 7 11 ... 24 28 31 ... 332 335 339 



3. Limitations of Memoryless Near-Collisions 

A drawback to the truncation based solution is of course that we can only find 
e-near-collisions of a limited shape (depending on the fixed bit positions), so only a 
fraction of all possible e-near-collisions can be detected, namely S^e)/ S n (e). 

To improve upon this, [12, 13] had the idea is to replace the projection r e by 
a more complicated function g, where g is the decoding operation of a certain 
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covering code C. Let R = R{C) be the covering radius of a code C, that is 
R(C) = max xeZ n min ceC d(x, c). 

Theorem 3 ([12]). Let H be a hash function of output size n. Let C be a covering 
code of the same length n, size K and covering radius R{C) and assume there exists 
an efficiently computable map g such that g : — > C, where x H- c with d(x, c) < 
R(C). If we further assume that g o H acts like a random mapping, in the sense 
that the expected cycle and tail lengths are the same as for the iteration of a truly 
random mapping on a space of size K , then we can find 2R(C) -near- collisions for 
H with a complexity of about \/K and with virtually no memory requirements. 

An extensive amount of work in the theory of covering codes is devoted to derive 
upper and lower bounds for K (when n and R are given) and to construct codes 
achieving these bounds (cf. [5, 19, 21]). The authors of [13] have investigated a 
class of efficient codes suitable for the approach outlined in Th. 3. The approach 
via covering codes constitutes an improvement over the purely truncation based 
approach. However, (depending on e) the query-complexity of the approach outlined 
in Th. 3 is larger than the expected query-complexity of the table-based birthday 
method, cf. Lem. 1. 

Remark 3. We briefly want to mention the possibility of considering a probabilistic 
version of the covering code approach in an analogous manner to the approach in 
Sec. 2. In other words, what is the probability to find a (2R — l)-near-collision if the 
covering radius is Rl This problem has also been studied in [12] with the outcome 
that in general, finding a closed expression like (2) is beyond reach. Numerical 
experiments for relevant values of n and e = 2R show, that increasing the covering 
radius is rarely bringing an improvement. We use [12, Eq. (20)] together with the 
optimal solution from [13] to compute complexities for small values of e in Table 3. 

The limitations of the covering code approach are inherent to the sphere covering 
bound, which states that K > 2 n /S n (R) (cf. [5]). Since we use codes with covering 
radius R to find 2i?-near-collisions, that is, e = 2R, the sphere covering bound im- 
plies that the size K of the code has to be larger than K > 2 n /S n (R) > 2 n /S n (2R), 
where the latter would be the desired quantity to match the complexity of Lem. 1 
to find an e-near-collision. 

In the following, we want to investigate, if there are other possibilities to choose 
a mapping g such that collisions for g o H imply e-near-collisions for H. In [12] it 
was shown, that the "perfect" mapping g is beyond reach: 

Lemma 4 ([ 2]). Let e > 1 , let H be a hash function and let g be a function such 
that 

d(H(m),H(m*)) < e g{H(m)) = g(H(m*)) 
holds. Then, g is a constant map and d(H(m), H(m*)) < e for all m,m* . 

So the best we can hope for is a mapping g : 7^ — > 1\ that satisfies 

g(y)=g(y')^d(y,y')<6, (10) 

for all y,y' G 1^. If we recall the requirements of Th. 3, it was stated that g o H 
should act like a random mapping in order to have the expected cycle and tail 
lengths of the iteration of g o H to be the same as for a truly random mapping on 
a space of size 2 k . 
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We formalize this in the following lemma. For this, we assume that the hash 
function H acts like a random mapping from a large domain D ~ 7h 2 to Z?J (since 
most hash standards define a maximum input length). First, we need yet another 
definition: 

Definition 4. Let D, I be finite domains. We call a function g: D — >■ / balanced, if 
|/| divides \D\ and for all z E I we have \g-\z)\ = \D\/\I\. 

Lemma 5. Let H : T/ 2 — y Z 2 be a random mapping. Furthermore, consider a func- 
tion g: — > ^2 with k < n. Then, g is balanced if and only if g o H : 7L l 2 — y J\ is 
a random mapping. 

Proof. Let g be balanced, that is, for all z G we have |g -1 (2:)| = 2 n ~ k . The sets 
P z := g~ l {z) for all z G Zg define a disjoint partition of 7* 2 of size |P Z | = 2 n ~ k and 
<? is constant on each set P z . 

Now let H be drawn uniformly at random from the set of all functions ll 2 ~ > ^2 1 
that is, for any function h: 7L % — y X 2 we have P(iJ = h) = 2~ n2 . For a given 
/i': Z2 — y Zg, we now want to compute the probability P(g o H — h'), for which we 
get 

F(goH = h') = 2- n2l \{h: l! 2 -> | = h'{x) for all x G Z^}| 

= 2~ n2 ' \{h : Z £ 2 Z2 1 1 G P^ (x .) for all x G Z^ 2 } | (11) 

because \P Z \ = 2 n ~ k for all z. In other words, g o H is a random mapping. 

Now assume that go H is a random mapping. That means, that for every h! : Zg — >■ 
Z2 we have P(y o H = h') = 2~ k2 . This stays true, if we choose /i' to be one of the 
2 k constant functions. If we argue along the same lines as in (11), we get 

2~ k2t = 2~ n2l \{h: Z £ 2 Z™ I g(h(x)) = c for all x G Z^}| 

for all c G Z|. Again, with P c = g~ l (c), we have 

2 (n-k)2? = \{h:Z£^Z$\ h(x) G P c for all x G Z e 2 }\. 

This leaves us with \P C \ = \g~ 1 (c) \ = 2 n ~ k for all c G Zj, and thus, g is balanced. □ 

Lem. 5 teaches us, that in a memoryless near-collision algorithm based on the 
iteration of the concatenation of the hash function H and a function g, additionally 
to the requirement (10) we need g also to be balanced. In the remaining part of 
this section, we want to show that this limits our choices basically to the known 
candidates for g. 

For the proof of the next proposition, we will need a lemma which goes back 
to a conjecture by Erdos. The solution of this problem by Kleitman in [10], was 
further investigated in [1]. Let diam(A) be the diameter of a set A C i.e., 
diam(A) := max X)J/6i 4. d(x, y). We now collect the results of Th. 1 and Th. 2 of [1] in 
the following lemma: 

Lemma 6. Let s be a non-negative integer. 

(i) The Hamming balls B s (x) for any 1 6 are the sets of maximal size among 
all sets A C with diam(A) = 2s < n — 1. 

(ii) The sets B s (x) U B s (y) for any x, y G Z2 with d(x, y) = 1 are sets of maximal 
size among all sets A C Z 2 with diam(A) = 2s + l<n — 1. 
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With this auxiliary result, we can now formulate the main result of this section. 

Theorem 4. Let 1 < e < | be given and let g: — > / be a balanced function 
satisfying (10), that is, 

g(y) = g(y') => d(v,v') < * 

for all y,y' G Z£. Then, \I\ must satisfy 

\I\>2 n /S n (\e/2])). (12) 

Proof. In the proof of Lem. 5 we have seen, that the balancedness of g implies a 
disjoint partition {J Z P Z of 1% where the size of each set P z is 2 n /\I\. The sets P z 
are exactly such that g{x) = z for all x G P z . Taking property (10) into account, 
we need that diam(P z ) < e. Therefore, Lem. 6 teaches us that if e is even, we 
have 2 n /\I\ < S n (e/2) and 2 n /\I\ < S n (*=±) + (^/J, for odd e. Since S n (*=±) + 
(( 6 -i)/2) — Sn(^-rf-)i we can unify the expressions to (12). □ 

As a consequence we get the following corollary: 

Corollary 1. Let H be an n-bit hash function, let 1 < e < | and let g be a 

balanced function satisfying (10). Then, the complexity to find an e-near- collision 
by applying a cycle-finding algorithm to the concatenation g o H is bounded from 
below byVL(2 n / 2 S n (\e/2~\)~ l l 2 ). 



4. Conclusion 

At the moment, a lot of effort is dedicated to the cryptanalysis of concrete hash 
function designs. From a theoretical perspective it is still very important to investi- 
gate generic aspects of non-random properties of hash functions. In this paper, we 
have analyzed several aspects of the question of finding near-collisions in a mem- 
oryless way. This problem has recently been investigated in [12, 13]. All these 
methods rely on the application of a cycle-finding technique to an alteration (that 
is, concatenation with a new mapping) of the hash function. We have investigated 
in full detail the complexity of a probabilistic version of the simple truncation based 
approach. Furthermore, we have shown that the approach in general is limited in 
its capabilities, in the sense, that if g is such that finding a collision for goH implies 
a near-collision for H, the query-complexity of this approach is always higher than 
the query-complexity of a birthday-like method using a table of exponential size. A 
comparison of the known methods is compiled in Tables 2 and 3. It has to be noted 
that in practice the real complexity of the table-based method will be dominated by 
the table queries and not by the hash computations. 
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Table 2. Methods for finding e-near-collisions of an n-bit hash func- 
tion H. 



short explana- 
tion 


memory 


complexity 


remarks 


cycle finding ap- 
proach applied to 
an e-truncation 
of H 


negligible 
(memory is only 
required for cycle 
finding) 


2 (n-e)/2 


cf. Lemma 2 and [9]; 


cycle finding 
approach applied 
to an 2e + 1- 
truncation of 
H 


negligible 
(memory is only 
required for cycle 
finding) 


2 (n+l)/2-e 


cf. Remark 2 and [12]; (A) in 
Table 3; 


cycle finding 
approach applied 
to an optimized 
/i-truncation of 
H (/i > e) 


negligible 
(memory is only 
required for cycle 
finding) 


2 (n+")/2^( e )-l 


optimal )jl = fi(e) is unique 
and fx ~ (Z + V2J(e — Ij, cf. 
Theorem 2; (B) in Table 3; 


table based ap- 
proach 


a table of expo- 
nential size in 
n for the pairs 
(to, H{m)) 


2"/ 2 5„(e)- 1 / 2 


cf. Lemma 1 and [12]; (C) in 
Table 3; 


coding based 
approach 


negligible 
(memory is only 


for even e = 2R: 
2 (n-M-r)/2 ) where 


cf. [12, 13]; (D) in Table 3; 
for odd e the coding based ap- 



required for cod- i := [\og 2 {n/R + 1)J , proach for e + 1 is repeated un- 
ing and cycle r := [(n— R{2 1 — 1 ) ) /2 £ J til an e-ncar-collision is found, 

finding) cf. Remark 3; 



Table 3. For given e G {1,...,8} and hash length n G 
{160, 256, 512}, the table compares the base-2 logarithms of the com- 
plexities (A) - (D) of Table 2, together with (E) which is the bound 
of Corollary 1. 



n = 160 n = 256 n = 512 



e 


(A) 


(B) 


(C) 


(D) 


(E) 


(A) 


(B) 


(C) 


(D) 


(E) 


(A) 


(B) 


(C) 


(D) 


(E) 


1 


79.5 


79.4 


76.3 


81.9 


76.3 


127.5 


127.4 


124.0 


130.4 


124.0 


255.5 


255.4 


251.5 


258.9 


251.5 


2 


78.5 


78.5 


73.2 


76.5 


76.3 


126.5 


126.5 


120.5 


124.0 


124.0 


254.5 


254.5 


247.5 


251.5 


251.5 


3 


77.5 


77.5 


70.3 


77.5 


73.2 


125.5 


125.5 


117.3 


125.4 


120.5 


253.5 


253.5 


243.8 


253.4 


247.5 


4 


76.5 


76.4 


67.7 


74.0 


73.2 


124.5 


124.4 


114.3 


121.0 


120.5 


252.5 


252.4 


240.3 


248.0 


247.5 


5 


75.5 


75.2 


65.2 


74.0 


70.3 


123.5 


123.2 


111.5 


121.7 


117.3 


251.5 


251.2 


237.0 


249.1 


243.8 


6 


74.5 


74.1 


62.8 


71.5 


70.3 


122.5 


122.1 


108.8 


118.5 


117.3 


250.5 


250.1 


233.8 


245.0 


243.8 


7 


73.5 


72.9 


60.6 


71.3 


67.7 


121.5 


120.9 


106.2 


118.5 


114.3 


249.5 


248.9 


230.7 


245.5 


240.3 


8 


72.5 


71.7 


58.5 


69.5 


67.7 


120.5 


119.7 


103.7 


116.0 


114.3 


248.5 


247.7 


227.7 


242.0 


240.3 
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